Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save/load nmethod without going through CodeBuffer #27

Closed
wants to merge 17 commits into from

Conversation

ashu-mehra
Copy link
Collaborator

@ashu-mehra ashu-mehra commented Dec 11, 2024

This is the prototype for storing and loading nmethods without going through the CodeBuffer.
The new implementation is protected by the flag -XX:+UseNewCode2.

Some numbers using this implementation:
spring-boot-getting-started [0] shows startup improvement of ~ 7.5%% and quarkus-getting-started [1] shows improvement of around ~3.5%.

Numbers for Springboot

Old build = /home/asmehra/data/ashu-mehra/leyden/build/nmethod-single-copy-load-release/images/jdk with options -XX:+UnlockDiagnosticVMOptions -XX:-UseNewCode2
New build = /home/asmehra/data/ashu-mehra/leyden/build/nmethod-single-copy-load-release/images/jdk with options -XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode2
Run,Old CDS + AOT,New CDS + AOT
1,544,512
2,554,525
3,550,508
4,571,515
5,550,506
6,552,515
7,568,510
8,554,517
9,551,506
10,556,508
Geomean,554.94,512.17
Stdev,7.90,5.65

Numbers for Quarkus:

Old build = /home/asmehra/data/ashu-mehra/leyden/build/nmethod-single-copy-load-release/images/jdk with options -XX:+UnlockDiagnosticVMOptions -XX:-UseNewCode2
New build = /home/asmehra/data/ashu-mehra/leyden/build/nmethod-single-copy-load-release/images/jdk with options -XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode2
Run,Old CDS + AOT,New CDS + AOT
1,359,346
2,360,353
3,373,357
4,376,356
5,366,353
6,360,356
7,361,361
8,347,349
9,355,336
10,374,342
Geomean,363.00,350.82
Stdev,8.70,7.27

-Xlog:init logs the load time from AOT code cache at JVM exit.
For spring-boot-getting-started without UseNewCode2:

[3.459s][info][init]     SC Load Time:           0.202 s
[3.459s][info][init]       nmethod register:       0.135 s
[3.459s][info][init]       find cached code:       0.007 s

For spring-boot-getting-started with UseNewCode2:

[3.192s][info][init] 
[3.192s][info][init]     SC Load Time:           0.138 s
[3.192s][info][init]       nmethod register:       0.111 s
[3.192s][info][init]       find cached code:       0.006 s

For quarkus-getting-started without UseNewCode2

[0.392s][info][init]     SC Load Time:           0.060 s
[0.392s][info][init]       nmethod register:       0.039 s
[0.392s][info][init]       find cached code:       0.002 s

For quarkus-getting-started with UseNewCode2

[0.386s][info][init]     SC Load Time:           0.033 s
[0.386s][info][init]       nmethod register:       0.027 s
[0.386s][info][init]       find cached code:       0.002 s

[0] https://github.com/openjdk/leyden/tree/premain/test/hotspot/jtreg/premain/spring-boot-getting-started
[1] https://github.com/openjdk/leyden/tree/premain/test/hotspot/jtreg/premain/quarkus-getting-started


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed (1 review required, with at least 1 Committer)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/leyden.git pull/27/head:pull/27
$ git checkout pull/27

Update a local copy of the PR:
$ git checkout pull/27
$ git pull https://git.openjdk.org/leyden.git pull/27/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27

View PR using the GUI difftool:
$ git pr show -t 27

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/leyden/pull/27.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 11, 2024

👋 Welcome back asmehra! A progress list of the required criteria for merging this PR into premain will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 11, 2024

@ashu-mehra This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

Save/load nmethod without going through CodeBuffer

Reviewed-by: adinn, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 2 new commits pushed to the premain branch:

  • e53efd9: Support pointer tagging in archived metaspace objects (for archived MethodData)
  • 78774b0: sync lambdaFormInvokers.cpp with mainline

Please see this link for an up-to-date comparison between the source branch of this pull request and the premain branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the premain branch, type /integrate in a new comment.

@ashu-mehra
Copy link
Collaborator Author

@adinn @vnkozlov fyi

@vnkozlov
Copy link
Collaborator

vnkozlov commented Jan 6, 2025

Hi @ashu-mehra
This is good POC. In general this looks good. My only concern is that SCnmethod is "copy" of nmethod so you have to copy values. Can you make SCnmethod as subclass of nmethod so you don't need to allocate it and copy offsets values which don't change? If it is difficult, please explain why.

I see that you use apply_relocations() to update relocations only once. Good.

@vnkozlov
Copy link
Collaborator

vnkozlov commented Jan 7, 2025

Can you make SCnmethod as subclass of nmethod so you don't need to allocate it and copy offsets values which don't change? If it is difficult, please explain why.

I may know answer myself. It is virtual table pointer since nmethod class has virtual methods.

Lets go with your SCnmethod for now.

@ashu-mehra
Copy link
Collaborator Author

It is virtual table pointer since nmethod class has virtual methods.

That's right. My next step is to try store nmethods as is and map them back which would require handling of vtable ptr. CDS already has capability to do that. So it should be feasible.

@openjdk
Copy link

openjdk bot commented Feb 21, 2025

@ashu-mehra this pull request can not be integrated into premain due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout nmethod-single-copy-load
git fetch https://git.openjdk.org/leyden.git premain
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge premain"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Feb 21, 2025
@bridgekeeper
Copy link

bridgekeeper bot commented Mar 4, 2025

@ashu-mehra This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Mar 6, 2025
@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Mar 17, 2025
@ashu-mehra ashu-mehra self-assigned this Mar 17, 2025
@ashu-mehra ashu-mehra force-pushed the nmethod-single-copy-load branch from acfe58c to 04b77ea Compare March 17, 2025 11:36
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Mar 17, 2025
@ashu-mehra ashu-mehra marked this pull request as ready for review March 17, 2025 11:39
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 17, 2025
@mlbridge
Copy link

mlbridge bot commented Mar 17, 2025

Webrevs

Signed-off-by: Ashutosh Mehra <[email protected]>
@ashu-mehra
Copy link
Collaborator Author

I have updated this PR to remove SCnmethod and store nmethod directly into AOT code cache.
Perf numbers have changed compared to the last prototype.
For spring boot:

Old build = /home/asmehra/data/ashu-mehra/leyden/build/premain-release/images/jdk with options
New build = /home/asmehra/data/ashu-mehra/leyden/build/nmethod-single-copy-load-release/images/jdk with options
Run,Old CDS + AOT,New CDS + AOT
1,544,532
2,548,526
3,558,534
4,550,529
5,541,532
6,540,540
7,539,542
8,554,526
9,540,532
10,544,530
Geomean,545.77,532.28
Stdev,6.18,5.02

For quarkus:

Old build = /home/asmehra/data/ashu-mehra/leyden/build/premain-release/images/jdk with options
New build = /home/asmehra/data/ashu-mehra/leyden/build/nmethod-single-copy-load-release/images/jdk with options
Run,Old CDS + AOT,New CDS + AOT
1,358,349
2,355,356
3,360,347
4,372,342
5,357,353
6,344,336
7,356,353
8,352,339
9,359,338
10,347,340
Geomean,355.93,345.23
Stdev,7.27,6.84

This shows an improvement of 2.5~3%.

SC Load Time as reported by -Xlog:init shows it is cut down to almost half.
Springboot without this change:

[0.810s][info][init]     SC Load Time:           0.127 s
[0.810s][info][init]       nmethod register:       0.091 s
[0.810s][info][init]       find cached code:       0.003 s

Springboot with this change:

[0.800s][info][init]     SC Load Time:           0.064 s
[0.800s][info][init]       nmethod register:       0.055 s
[0.800s][info][init]       find cached code:       0.003 s

Quarkus without this change:

[0.392s][info][init]     SC Load Time:           0.056 s
[0.392s][info][init]       nmethod register:       0.040 s
[0.392s][info][init]       find cached code:       0.002 s

Quarkus with this change:

[0.383s][info][init]     SC Load Time:           0.027 s
[0.383s][info][init]       nmethod register:       0.024 s
[0.383s][info][init]       find cached code:       0.002 s

@vnkozlov
Copy link
Collaborator

Nice.

@ashu-mehra do you know why "New CDS + AOT" results for SpringBoot is worse than before: geo mean 532.28 vs 512.17?

@ashu-mehra
Copy link
Collaborator Author

do you know why "New CDS + AOT" results for SpringBoot is worse than before: geo mean 532.28 vs 512.17?

I noticed this but at the moment I don't know the reason for this increase in startup time.
The SC Load Time has gone down on expected lines. So may be there are other factors playing a role here.
I am trying to get my earlier prototype working with latest premain to see if I still get the same numbers as before.

@ashu-mehra
Copy link
Collaborator Author

I am trying to get my earlier prototype working with latest premain to see if I still get the same numbers as before.

@vnkozlov I used the first prototype but I am not getting same numbers as mentioned in #27 (comment)
The startup time and SC Load time numbers are comparable to what I am getting with the current patch. So I don't think the current patch is any worse than the earlier prototype (and it shouldn't be!).

@vnkozlov
Copy link
Collaborator

@ashu-mehra how complex to do the same for adapters?

@ashu-mehra
Copy link
Collaborator Author

how complex to do the same for adapters?

I would say it should be easier compared to nmethods. I actually did the first prototype for adapters before trying out nmethods, but it was not storing adapter blobs in the aot code cache. I can resurrect that old patch for adapters and update it to store adapter blobs, as we are doing now for nmethods.

bool write_oop_map_set(nmethod* nm);
bool write_nmethod_reloc_immediates(GrowableArray<oop>& oop_list, GrowableArray<Metadata*>& metadata_list);
bool write_nmethod_extra_relocations(nmethod* nm, GrowableArray<oop>& oop_list, GrowableArray<Metadata*>& metadata_list);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_nmethod_extra_relocations name doesn't really say what the extra relocs are for i.e. to ensure that at load time we correctly relocate a reference into the current runtime. Could we rename this to write_nmethod_loadtime_relocations.

Copy link
Collaborator

@adinn adinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 20, 2025
@vnkozlov
Copy link
Collaborator

how complex to do the same for adapters?

I would say it should be easier compared to nmethods. I actually did the first prototype for adapters before trying out nmethods, but it was not storing adapter blobs in the aot code cache. I can resurrect that old patch for adapters and update it to store adapter blobs, as we are doing now for nmethods.

Thank you, @ashu-mehra
I am almost ready to create PR for mainline with changes for adapters caching. It works now for simple blob. I need to clean it up. After that you can do that this direct saving of adapters there.

Copy link
Collaborator

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

Comment on lines 1371 to 1375
SCCEntry* scc_entry = scc_reader->scc_entry();
const char* reloc_addr = scc_reader->addr_of_entry_offset(scnm->relocation_data_offset());
SCCache::copy_bytes(reloc_addr, (address)relocation_begin(), scnm->relocation_size());
const char* content_addr = scc_reader->addr_of_entry_offset(scnm->content_offset());
SCCache::copy_bytes(content_addr, content_begin(), scnm->content_size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why copy them separately?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean they usually collocated.

Comment on lines 3197 to 3218
uint buffer_offset = entry_position + scnm->oop_metadata_offset();
set_read_position(buffer_offset);
if (!read_oop_metadata_list(target, oop_list, metadata_list, oop_recorder)) {
return false;
}

buffer_offset = entry_position + scnm->reloc_immediates_offset();
set_read_position(buffer_offset);
if (!read_oop_metadata_list(target, reloc_immediate_oop_list, reloc_immediate_metadata_list, nullptr)) {
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you need to create these intermediate arrays for oops and metadata? Can you cache oop_recorder data as we did before? Or I missing something here.

@vnkozlov
Copy link
Collaborator

Github somehow messed up the order of comments. Ignore my [Outdated] code comments - they we from Draft review.

@vnkozlov
Copy link
Collaborator

how complex to do the same for adapters?

I would say it should be easier compared to nmethods. I actually did the first prototype for adapters before trying out nmethods, but it was not storing adapter blobs in the aot code cache. I can resurrect that old patch for adapters and update it to store adapter blobs, as we are doing now for nmethods.

Thank you, @ashu-mehra I am almost ready to create PR for mainline with changes for adapters caching. It works now for simple blob. I need to clean it up. After that you can do that this direct saving of adapters there.

openjdk/jdk#24158

@ashu-mehra
Copy link
Collaborator Author

Thanks @adinn for finding out the missing code to flush icache. I have updated the patch to flush the icache and that fixes the intermittent crashes seen on aarch64. I think this patch is now ready to be pushed to premain branch.

@ashu-mehra
Copy link
Collaborator Author

@adinn or @vnkozlov can either of you please review the changes again as I have pushed few more commits since the approval.

@shipilev
Copy link
Member

ARM32 build is failing, I think because base for this PR misses adb8107 -- pull it in to get clean GHA run :)

Copy link
Collaborator

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@ashu-mehra
Copy link
Collaborator Author

I got some numbers for aarch64 as well and they are in similar range as for x86-64

On a 128 cpu server:

Old build = /tmp/ashu/leyden/build/premain-release/images/jdk with options
New build = /tmp/ashu/leyden/build/nmethod-single-copy-load-release/images/jdk with options
Run,Old CDS + AOT,New CDS + AOT       
1,463,444                                                                                                                                                                                                                                                     2,457,443                                                                                                                                                                                                                                                     3,465,453                                                                                                                      
4,452,447                                                                                                                      
5,460,446                                                                                                                      
6,458,443                      
7,465,445                                                                                                                                                                                                                                                     
8,464,454
9,461,449
10,455,449
Geomean,459.98,447.28
Stdev,4.22,3.72

On same 128 cpu server but bound to 0-7 cpus:

Old build = /tmp/ashu/leyden/build/premain-release/images/jdk with options
New build = /tmp/ashu/leyden/build/nmethod-single-copy-load-release/images/jdk with options
Run,Old CDS + AOT,New CDS + AOT
1,441,424
2,449,420
3,435,427
4,429,427
5,429,430
6,437,424
7,441,425
8,447,426
9,429,432
10,438,421
Geomean,437.45,425.59
Stdev,6.86,3.50

Load time as reported by Xlog:init for premain:

[0.680s][info][init]     SC Load Time:           0.210 s                                                                       
[0.680s][info][init]       nmethod register:       0.148 s                                                                     
[0.680s][info][init]       find cached code:       0.004 s  

Load time as reported by Xlog:init for this PR:

[0.675s][info][init]     SC Load Time:           0.143 s                                                                       
[0.675s][info][init]       nmethod register:       0.128 s                                                                     
[0.675s][info][init]       find cached code:       0.004 s

@ashu-mehra
Copy link
Collaborator Author

/integrate

@openjdk
Copy link

openjdk bot commented Mar 31, 2025

Going to push as commit b55a514.
Since your change was applied there have been 3 commits pushed to the premain branch:

  • 2c2289b: Backport some of the JEP2 review changes
  • e53efd9: Support pointer tagging in archived metaspace objects (for archived MethodData)
  • 78774b0: sync lambdaFormInvokers.cpp with mainline

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 31, 2025
@openjdk openjdk bot closed this Mar 31, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 31, 2025
@openjdk
Copy link

openjdk bot commented Mar 31, 2025

@ashu-mehra Pushed as commit b55a514.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants